135 research outputs found
Metascheduling of HPC Jobs in Day-Ahead Electricity Markets
High performance grid computing is a key enabler of large scale collaborative
computational science. With the promise of exascale computing, high performance
grid systems are expected to incur electricity bills that grow super-linearly
over time. In order to achieve cost effectiveness in these systems, it is
essential for the scheduling algorithms to exploit electricity price
variations, both in space and time, that are prevalent in the dynamic
electricity price markets. In this paper, we present a metascheduling algorithm
to optimize the placement of jobs in a compute grid which consumes electricity
from the day-ahead wholesale market. We formulate the scheduling problem as a
Minimum Cost Maximum Flow problem and leverage queue waiting time and
electricity price predictions to accurately estimate the cost of job execution
at a system. Using trace based simulation with real and synthetic workload
traces, and real electricity price data sets, we demonstrate our approach on
two currently operational grids, XSEDE and NorduGrid. Our experimental setup
collectively constitute more than 433K processors spread across 58 compute
systems in 17 geographically distributed locations. Experiments show that our
approach simultaneously optimizes the total electricity cost and the average
response time of the grid, without being unfair to users of the local batch
systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System
Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers
A massive gap exists between current quantum computing (QC) prototypes, and
the size and scale required for many proposed QC algorithms. Current QC
implementations are prone to noise and variability which affect their
reliability, and yet with less than 80 quantum bits (qubits) total, they are
too resource-constrained to implement error correction. The term Noisy
Intermediate-Scale Quantum (NISQ) refers to these current and near-term systems
of 1000 qubits or less. Given NISQ's severe resource constraints, low
reliability, and high variability in physical characteristics such as coherence
time or error rates, it is of pressing importance to map computations onto them
in ways that use resources efficiently and maximize the likelihood of
successful runs.
This paper proposes and evaluates backend compiler approaches to map and
optimize high-level QC programs to execute with high reliability on NISQ
systems with diverse hardware characteristics. Our techniques all start from an
LLVM intermediate representation of the quantum program (such as would be
generated from high-level QC languages like Scaffold) and generate QC
executables runnable on the IBM Q public QC machine. We then use this framework
to implement and evaluate several optimal and heuristic mapping methods. These
methods vary in how they account for the availability of dynamic machine
calibration data, the relative importance of various noise parameters, the
different possible routing strategies, and the relative importance of
compile-time scalability versus runtime success. Using real-system
measurements, we show that fine grained spatial and temporal variations in
hardware parameters can be exploited to obtain an average x (and up to
x) improvement in program success rate over the industry standard IBM
Qiskit compiler.Comment: To appear in ASPLOS'1
Full-Stack, Real-System Quantum Computer Studies: Architectural Comparisons and Design Insights
In recent years, Quantum Computing (QC) has progressed to the point where
small working prototypes are available for use. Termed Noisy Intermediate-Scale
Quantum (NISQ) computers, these prototypes are too small for large benchmarks
or even for Quantum Error Correction, but they do have sufficient resources to
run small benchmarks, particularly if compiled with optimizations to make use
of scarce qubits and limited operation counts and coherence times. QC has not
yet, however, settled on a particular preferred device implementation
technology, and indeed different NISQ prototypes implement qubits with very
different physical approaches and therefore widely-varying device and machine
characteristics.
Our work performs a full-stack, benchmark-driven hardware-software analysis
of QC systems. We evaluate QC architectural possibilities, software-visible
gates, and software optimizations to tackle fundamental design questions about
gate set choices, communication topology, the factors affecting benchmark
performance and compiler optimizations. In order to answer key cross-technology
and cross-platform design questions, our work has built the first top-to-bottom
toolflow to target different qubit device technologies, including
superconducting and trapped ion qubits which are the current QC front-runners.
We use our toolflow, TriQ, to conduct {\em real-system} measurements on 7
running QC prototypes from 3 different groups, IBM, Rigetti, and University of
Maryland. From these real-system experiences at QC's hardware-software
interface, we make observations about native and software-visible gates for
different QC technologies, communication topologies, and the value of
noise-aware compilation even on lower-noise platforms. This is the largest
cross-platform real-system QC study performed thus far; its results have the
potential to inform both QC device and compiler design going forward.Comment: Preprint of a publication in ISCA 201
On Optimizing Distributed Tucker Decomposition for Dense Tensors
The Tucker decomposition expresses a given tensor as the product of a small
core tensor and a set of factor matrices. Apart from providing data
compression, the construction is useful in performing analysis such as
principal component analysis (PCA)and finds applications in diverse domains
such as signal processing, computer vision and text analytics. Our objective is
to develop an efficient distributed implementation for the case of dense
tensors. The implementation is based on the HOOI (Higher Order Orthogonal
Iterator) procedure, wherein the tensor-times-matrix product forms the core
routine. Prior work have proposed heuristics for reducing the computational
load and communication volume incurred by the routine. We study the two metrics
in a formal and systematic manner, and design strategies that are optimal under
the two fundamental metrics. Our experimental evaluation on a large benchmark
of tensors shows that the optimal strategies provide significant reduction in
load and volume compared to prior heuristics, and provide up to 7x speed-up in
the overall running time.Comment: Preliminary version of the paper appears in the proceedings of
IPDPS'1
Formal Constraint-based Compilation for Noisy Intermediate-Scale Quantum Systems
Noisy, intermediate-scale quantum (NISQ) systems are expected to have a few
hundred qubits, minimal or no error correction, limited connectivity and limits
on the number of gates that can be performed within the short coherence window
of the machine. The past decade's research on quantum programming languages and
compilers is directed towards large systems with thousands of qubits. For near
term quantum systems, it is crucial to design tool flows which make efficient
use of the hardware resources without sacrificing the ease and portability of a
high-level programming environment. In this paper, we present a compiler for
the Scaffold quantum programming language in which aggressive optimization
specifically targets NISQ machines with hundreds of qubits. Our compiler
extracts gates from a Scaffold program, and formulates a constrained
optimization problem which considers both program characteristics and machine
constraints. Using the Z3 SMT solver, the compiler maps program qubits to
hardware qubits, schedules gates, and inserts CNOT routing operations while
optimizing the overall execution time. The output of the optimization is used
to produce target code in the OpenQASM language, which can be executed on
existing quantum hardware such as the 16-qubit IBM machine. Using real and
synthetic benchmarks, we show that it is feasible to synthesize near-optimal
compiled code for current and small NISQ systems. For large programs and
machine sizes, the SMT optimization approach can be used to synthesize compiled
code that is guaranteed to finish within the coherence window of the machine.Comment: Invited paper in Special Issue on Quantum Computer Architecture: a
full-stack overview, Microprocessors and Microsystem
Byzantine-Resilient Federated Learning with Heterogeneous Data Distribution
For mitigating Byzantine behaviors in federated learning (FL), most
state-of-the-art approaches, such as Bulyan, tend to leverage the similarity of
updates from the benign clients. However, in many practical FL scenarios, data
is non-IID across clients, thus the updates received from even the benign
clients are quite dissimilar. Hence, using similarity based methods result in
wasted opportunities to train a model from interesting non-IID data, and also
slower model convergence. We propose DiverseFL to overcome this challenge in
heterogeneous data distribution settings. Rather than comparing each client's
update with other client updates to detect Byzantine clients, DiverseFL
compares each client's update with a guiding update of that client. Any client
whose update diverges from its associated guiding update is then tagged as a
Byzantine node. The FL server in DiverseFL computes the guiding update in every
round for each client over a small sample of the client's local data that is
received only once before start of the training. However, sharing even a small
sample of client's data with the FL server can compromise client's data privacy
needs. To tackle this challenge, DiverseFL creates a Trusted Execution
Environment (TEE)-based enclave to receive each client's sample and to compute
its guiding updates. TEE provides a hardware assisted verification and
attestation to each client that its data is not leaked outside of TEE. Through
experiments involving neural networks, benchmark datasets and popular Byzantine
attacks, we demonstrate that DiverseFL not only performs Byzantine mitigation
quite effectively, it also almost matches the performance of OracleSGD, where
the server only aggregates the updates from the benign clients
Effect of Fibres on the behaviour of Bottle shaped strut
Bottle-shaped struts are the critical elements in the design of D-regions using Strut- and-Tie method. The transverse tension develops due to dispersion of compression load which further leads to splitting crack in bottle shaped strut. Due to this, the strut fails before reaching its ultimate compression capacity. Higher resistance to the transverse tension can improve the strut capacity and its efficiency in load transfer. Transverse tensile stress can be resisted by providing the steel reinforcement or through the addition of discrete fibres in the concrete. Many international codes like ACI, AASTHO, and CSA have suggested guidelines about the addition of steel fibre reinforcement in the bottle-shaped struts for resisting the transverse tension. Still, the influence of the discrete fibres on the performance of the bottle-shaped strut is not well established. The performance of the bottle-shaped strut in terms of efficiency factors, crack pattern and failure mode for different amounts of macro steel fibres and micro polypropylene fibres are studied using experimental investigation. Specimens of 600mm x 600mmx 100mm size were tested under compression. Steel fibres are added in the proportions of 0.7%, 0.9% and 1.1% volume fractions to the concrete. Effect of fibre hybridization also studied by adding micro polypropylene fibres in the proportions of 1% and 2% in addition to the steel fibres. Experimental results showed that adding discrete fibres in concrete significantly improved the resistance to the transverse tension in bottle-shaped struts and led to the increased load-carrying capacity of the specimens. A 75% improvement in the efficiency factor is observed at 0.9% volume of steel fibre addition. Addition of micro polypropylene fibres to the macro steel fibres further enhanced the load-carrying capacity of the bottle-shaped struts. Microfibres in the concrete effectively arrested the micro-cracks and delayed the occurrence of a first splitting crack in the strut region. Due to this the mode of failure changed to ductile through the formation of a greater number of small cracks with less crack width at the ultimate load. Results of this study clearly show that the addition of discrete fibres to the concrete is an effective solution to improve the performance of the bottle-shaped struts in terms of ultimate strength and serviceability. © 2021 Institute of Physics Publishing
Optical detection of the structural properties of tumor tissue generated by xenografting of drug-sensitive and drug-resistant cancer cells using partial wave spectroscopy (PWS)
mesoscopic physics-based optical imaging technique, partial wave spectroscopy (PWS), has been used for the detection of cancer by probing nanoscale structural alterations in cells/tissue. The development of drug-resistant cancer cells/tissues during chemotherapy is a major challenge in cancer treatment. In this paper, using a mouse model and PWS, the structural properties of tumor tissue grown in 3D structures by xenografting drug-resistant and drug-sensitive human prostate cancer cells having 2D structures, are studied. The results show that the 3D xenografted tissues maintain a similar hierarchy of the degree of structural disorder properties as that of the 2D original drug-sensitive and drug-resistant cells
- …